RemixIT: Continual Self-Training of Speech Enhancement Models via Bootstrapped Remixing

نویسندگان

چکیده

We present RemixIT, a simple yet effective self-supervised method for training speech enhancement without the need of single isolated in-domain nor noise waveform. Our approach overcomes limitations previous methods which make them dependent on clean target signals and thus, sensitive to any domain mismatch between train test samples. RemixIT is based continuous self-training scheme in pre-trained teacher model out-of-domain data infers estimated pseudo-target mixtures. Then, by permuting remixing together, we generate new set bootstrapped mixtures corresponding pseudo-targets are used student network. Vice-versa, periodically refines its estimates using updated parameters latest models. Experimental results multiple datasets tasks not only show superiority our over prior approaches but also showcase that can be combined with separation as well applied towards semi-supervised unsupervised adaptation task. analysis, paired empirical evidence, sheds light inside functioning wherein keeps obtaining better performance while observing severely degraded pseudo-targets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bootstrapped Self Training for Knowledge Base Population

A central challenge in relation extraction is the lack of supervised training data. Pattern-based relation extractors suffer from low recall, whereas distant supervision yields noisy data which hurts precision. We propose bootstrapped selftraining to capture the benefits of both systems: the precision of patterns and the generalizability of trained models. We show that training on the output of...

متن کامل

Speech Enhancement via EMD

In this study, two new approaches for speech signal noise reduction based on the empirical mode decomposition (EMD) recently introduced by Huang et al. (1998) are proposed. Based on the EMD, both reduction schemes are fully data-driven approaches. Noisy signal is decomposed adaptively into oscillatory components called intrinsic mode functions (IMFs), using a temporal decomposition called sifti...

متن کامل

Speech enhancement via energy separation

This work presents a novel technique to enhance speech signals in the presence of interfering noise. In this paper, the amplitude and frequency (AMFM) modulation model [7] and a multi-band analysis scheme [5] are applied to extract the speech signal parameters. The enhancement process is performed using a time-warping function (n) that is used to warp the speech signal. (n) is extracted from th...

متن کامل

Bootstrapped Training of Event Extraction Classifiers

Most event extraction systems are trained with supervised learning and rely on a collection of annotated documents. Due to the domain-specificity of this task, event extraction systems must be retrained with new annotated data for each domain. In this paper, we propose a bootstrapping solution for event role filler extraction that requires minimal human supervision. We aim to rapidly train a st...

متن کامل

Deep Exploration via Bootstrapped DQN

Efficient exploration in complex environments remains a major challenge for reinforcement learning. We propose bootstrapped DQN, a simple algorithm that explores in a computationally and statistically efficient manner through use of randomized value functions. Unlike dithering strategies such as -greedy exploration, bootstrapped DQN carries out temporally-extended (or deep) exploration; this ca...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Journal of Selected Topics in Signal Processing

سال: 2022

ISSN: ['1941-0484', '1932-4553']

DOI: https://doi.org/10.1109/jstsp.2022.3200911